This assignment is for ETC5521 Assignment 1 by Team EMU comprising of Min Min Soh, Rohan Baghel, Xiyun Zhou, and Zhang Zhang.

1 Introduction and motivation

Global fishing receives a great deal of attention in the media for the past decades. The rise of world population has increased the demand for seafood across the world. Meanwhile, the health of fish population in the world remains as a concern with the rise of global fishing. Fish farming (or ‘aquaculture’)helps to contribute to the seafood production while alleviating the pressure of wild fisheries.

This study is based on the data sets of global fishing. We explore these data by analyzing the performance for each country/region in terms of consumption, production and fish stock level. Moreover, we compare different data sets to reveal unexpected findings.

We begin by describing the data in the next section, including the way we source it and prepare it for analysis.

In the analysis section, we present our observations through graphical displays. Our main tool is R, a programming language for statistical computing and graphics.

2 Data description

The data set has been obtained from tidytuesday r package or through the website https://ourworldindata.org. The data comprises of four files in the “.csv” format which is machine readable and can be used to analyze the state of fish production and consumption in the world. The data dictionary for the data set has been given below. They define the variables and their types in each of the data sets.

2.1 capture-fisheries-vs-aquaculture.csv

2.1.1 ‘captured_vs_farmed’ - Data Dictionary

Entity: Country/entity Code: Country code Year: Year Aquaculture production (metric tons): Production of aquaculture Capture fisheries production (metric tons): Captured aquaculture

The data has 5 variables and 14,674 observations. The ‘Code’ variable is character while others are numeric .

2.1.2 ‘captured_vs_farmed’ - Data quality Check

The data has 8.1% missing value. Most of them are in the variables of ‘aquaculture’ and ‘code’.

2.2 fish-and-seafood-consumption-per-capita.csv

2.2.1 ‘consumption’ - Data Dictionary

Entity: Country/entity Code: Country code Year: Year Fish,Seafood-Food supply quantity(kg/capita/yr)(FAO, 2020):Food supply in fish

As we can see, there are 4 variables and 11,028 observation in total. The ‘Entityy’ and ‘Code’ variable are character while others are numerical.

2.2.2 ‘consumption’ - Data quality Check

15.72% of values in ‘Code’ are missing. This does not affect our analysis.

2.3 fish-stocks-within-sustainable-levels.csv

2.3.1 ‘stock’ - Data Dictionary

Entity: Country/entity Code: code Year: Year Share of fish stocks within biologically sustainable levels: Share of sustainable fish stock Share of fish stocks that are overexploited: Share of fish stock that are overexploited

‘stock’ has 5 variables and 51 observations. The ‘Entityy’ and ‘Code’ variable are character while others are numerical.

2.3.2 ‘stock’ - Data quality Check

Similar to ‘consumption’, all missing value are from ‘Code’.

2.4 seafood-and-fish-production-thousand-tonnes.csv

2.4.1 ‘production’ - Data Dictionary

Entity: Country/entity Code: Country code Year: Year
Pelagic Fish - 2763 - Production - 5510 - tonnes: Pelagic Fish Crustaceans - 2765 - Production - 5510 - tonnes: Crustaceans Cephalopods - 2766 - Production - 5510 - tonnes: Cephalopods Demersal Fish - 2762 - Production - 5510 - tonnes: Demersal Freshwater Fish - 2761 - Production - 5510 - tonnes: Freshwater Molluscs, Other - 2767 - Production - 5510 - tonnes: Molluscs Marine Fish, Other - 2764 - Production - 5510 - tonnes: Marine

There are 10 variables and 10,326 observations. The ‘Entityy’ and ‘Code’ variable are character while others are numerical.

2.4.2 ‘production’ - Data quality Check

As we can see, only ’ Entity’ and ‘Code’ do not have missing values.

3 Expected findings

  1. What would be the notable finding when we comparing the consumption with production?
  1. Compare production of 7 different fish type. Which type of seafood contribute the most to meet the increasing demand?
  1. What have been the performances of the top 10 seafood producers?
  1. What has been the trend of fish stocks in the world?

4 Analysis and findings

4.1 What would be the notable finding when we comparing the consumption with production?

4.1.1 Method

Data cleansing: Use left_join() function combine consumption.csv and capture_vs_farmed.csv into one data set named con_cap. In order to get the total production, add a new variable is the summation of Aquaculture production and Capture production. We use the new data set to do line plots and scatter plot for question one.

4.1.2 Visualization and analysis

Figure 4.1 depicts the growing trend in world seafood consumption from 1969 and 2018. The average yearly seafood consumption for one person was 13.8kg, while it increased to 20kg in recent time. Moreover, it is obviously that between 1990 and 1992, the average seafood consumption decreased, mostly because of economic problems in low-income nations such those in Africa, Latin America, the Caribbean, and the Near East. This resulted in increased pressure on many items’ prices (Dumas, 1992). Seafood consumption has grown globally as the crisis has subsided. Overall, this is consistent with our assumption that seafood consumption would continue to rise.

The line plot 4.2of world yearly seafood production shows an upward trend from 1969 to 2018. In 1970, the average production of seafood is 118 million metric tons. While the production became 3 times greater in recent year. And the production decreasing period happened same with the seafood consumption. We can make assumption that there has a positive relationship between seafood consumption and production.

Figure 4.1: Average seafood consumption in the world over time

Figure 4.2: Average yearly seafood production in the world over time

Figure 4.3: Contribution of top 10 seafood producer 2015-2018(metric tons)

The bar plot 4.3 shows the contributions of top 10 seafood producer by country. China is the largest contributor and provide 40% seafood of the whole world in recent years. Indonesia and India produce 11% and 5.6% respectively. It’s intriguing to note that except US produce 3% and Norway produce 1% of the seafood other major producers are from Asia.

Relationship between seafood consumption and production

Figure 4.4: Relationship between seafood consumption and production

To explore relationship between seafood consumption and seafood production, we made a scatter matrix plot 4.4. Since seafood produce in two ways: aquaculture and capture, we add another two variables Aquaculture production and Capture productionto do the scatter plot and try to find some interest between each variable. The outcome demonstrates a strong positive correlation between each variable. There is a significant linear positive relation between average consumption and total production as we expected.The correlation coefficient is 0.97 almost reach to 1. Moreover, the two methods used to produce fish also has a positive linear relation.

4.1.3 Summary and interesting findings

  1. Upward trend for seafood production and consumption worldwide
  2. Positive linear relationship between seafood production and consumption.
  3. Two seafood produce method: Aquaculture and capture have positive linear relation.
  4. Asia is the major continent to produce seafood and China produces the most seafood.

4.2 Compare production of 7 different fish type. Which type of seafood contribute the most to meet the increasing demand?

4.2.1 Method

We want to explore something interest when comparing fish type production.

Use the contribution.csv data set and simplified the name of the variables of 7 types of fish, since they are too long and not easy to read. Add up each fish type as a total production variable. We wonder which country contribute the most for each type of fish. In order to to do that, I make a summary for average yearly production for each type of fish by countries and then using max() function to find the answer.

To compare production of each type fish, I decide to use column plot. Not only it can shows the trend but also easy to see the difference. Categorize year to 4 decade and get the average production for each type fish. Then use pivot_longer() to gather fish type into one column, this is the preparation for making the column plot.

Compare Fish Type production

Figure 4.5: Compare Fish Type production

4.2.2 Visualization and analysis

From the column plot 4.5 we can see that the Pelagic Fish has the largest production scale when comparing with other type fish. Demersal Fish used to take the second largest contribution of fish product. However, the second largest changed to Freshwater Fish in recent years. The overall trend for each type fish is increasing along with time. While the production of Pelagic Fish decreased in this decade and Demersal Fish almost keep a same level.

The table below shows the maximum yearly production of 7 different fish type by countries. Americas has the largest production for Pelagic Fish. Asia contribute most for Crustaceans, Cephalopods, Freshwater Fish, Molluscs and Marine Fish. And European countries contribute most of the Demersal Fish.

4.2.2.1 Maximum yearly production of 7 different fish type

Fish Type Entity production(metric tons)
Pelagic Fish Americas 12495788
Crustaceans Asia 3748006
Cephalopods Asia 1578872
Demersal Fish Europe 7979733
Freshwater Fish Asia 14040488
Molluscs Asia 6115966
Marine Fish Asia 6597212

4.2.3 Summary and interesting findings

  1. Pelagic fish takes the largest fish production.
  2. Asia contribute the most for fish production.

4.3 What have been the performances of the top 10 seafood producers?

4.3.1 Method:

We are going to specify the analysis area by focusing on the top 10 seafood producers and digging the data from a wider prospective.

Here we use ‘captured_vs_farmed’ data set. The first reason is that ‘captured_vs_farmed’ contains less NA value than ‘production’. Secondly, we tend to focus on the trend of ‘captured vs farmed’ seafood production for those top producers.

In addition, the use of sorted data from the original report help us quite a lot to simplify the process. These data includes: ‘country_cf’, ‘dat’, ‘world_cap_farm’ and ‘captured_vs_farmed_long’.

To answer this question, we took the following steps:

1 We worked out the overall top ten producers between 2015-2018 as well as the top 10 in terms of captured or farmed fish during the same period. We took the average production for each country 2015 - 2018 because there are many missing value in 2018. This allows us to present a accurate picture to reflect the most recent seafood productivity of each country.

2 Next we figured out the top ten producers of 1960s using data between 1960-1970 due to the problem of missing value.Then we compared the top 10 of 1960s with the list of 2010s.

3 We would take a look at the percentage contribution for each of the top producers.

4 Lastly, we tried to find out what were the changes of production between 1960s and 2010s.

4.3.2 Visualization and analysis

From the 3 plots above, we can see that China and Indonesia are the top 2 producers in each of the three ranks. China takes the dominate position of seafood production.

In addition, the U.s and Peru are the overall top producer but did not shown on the rank of aquaculture fishing. Similarly, Philippines and Bangladesh are not listed in the top captured seafood producer. To sum up, only six countries present a strong productivity in both captured and farmed production.

Compared the top 10 producer in 1960s with that of 2010s, we can see Graph.5d +Graph.5a that three developed countries:South Africa, Canada and Spain were the top 10 in 1960s but did not shown on the list of 2010s.On the other hand, three developing countries: Vietnam, Philippines and Bangladesh are the new countries listed in 2010s rank. Geographically, all the new comers in 2010s are located in South-east Asia.

Moreover, considering what we observed in Graph.5a-5c, it seems like the high aquaculture production is the main reason for Vietnam, Philippines and Bangladesh to be ranked as top 10 producer in 2010s.

It can be seen from Graph.5h that both China and Indonesia showed a more than 20 times increase in seafood production from 1960-2018. Following this are the tier 2 countries: Vietnam, India, Bangladesh and Philippines, a huge increase between 5 to 10 times. The U.s and Norway showed a 110% and 59% rise respectively.

On the other hand, Peru and Japan experienced a considerable decline in seafood production, which were 37% and 41%. However, they are still the top 10 producer all over the world.

Here we tend to further explore what happened to Japan and Peru. As we can see from Graph.5i, the farmed production of these two countries remains at the same level from 1960-2018, while the captured production declined quite a lot compare to the peak.

Moreover, compared Graph.5i and Graph.5h, it is clearly that the countries with relatively lower increase (The U.s, Norway) or decline (Peru, Japan) in the total production, all presenting a poor development in farmed production. In other words, those four countries produce more captured fish than farmed seafood.

In comparison, other six countries with relatively higher overall increase, experiencing a significant upward trend in farmed production. The level of aquaculture fishing of these six countries are higher than captured fish.

4.3.3 Summary and interesting findings:

1 Developing countries take more position in top ten seafood producers compared with 1960s..

2 ALL the new comers on the top 10 list in 2010s are from South-east Asia.

3 The center of world’s seafood production is now in West pacific and Southeast Pacific.

4 Farmed production contribute more than captured seafood to the improvement of countries’ overall productivity.

5 China and Indonesia are the top 2 seafood producers in total production as well as the farmed and captured ranking. Over the last 60 years, these two countries experienced the most significant upward trend in seafood production.

6 Peru and Japan were the top 2 countries in 1960s. However, they were the only two countries which showed a decrease in seafood production within the top 10 list over the last 60 years.

4.4 What has been the trend of fish stocks in the world?

4.4.1 Method:

Here we use score ‘stock’ and ‘consumption’ data. We not only analysed data within ‘stock’, but also compare ‘stock’ with ‘consumption’.

Similarly, we used some cleaned data from original report including ‘consumption’ and ‘annual_consumption’.

1 We firstly plotted a graph to have a comparison of sustainable vs overexploited based on trend and proportion.

2 Previous analysis reveals the world’s trend based on the stock type. Here we would like to go further by analyzing the trend based on each region.

3 Also, we are going to compare the world’s trend of overexploited stock with the trend of world seafood consumption. A considerable decline in seafood consumption around 1989 was observed in previous analysis. By comparing these two data sets, we would like to see if any similar pattern exist.

4.4.2 Visualization and analysis

It is obviously that sustainable stock is always having a large proportion then those overexploited.

Although the level of overexploited stock increasing with fluctuation overtime, it has never exceeded 40% of the total.

Here we compared the proportion of overexploited stock with the world seafood consumption.

We plotted 8 vertical lines.Each of them represents a start / end point of a particular up / downward trend. Also, vertical lines with the same color corresponds to each other in different graphs. Since a major decline in consumption in 1989 was observed in previous analysis, we set up 1989 as the first spot.

It can be seen from Graph.6b /Graph.6c that the trend of overexploited stock and world seafood consumption have a similar pattern. For each individual trend between two lines in Graph.6a, we can find out a similar trend between two lines with the corresponded color in Graph.6b.

It is interesting that most of the line (except the trend between yellow and purple) in the Graph.6b are on th left to the line with same color in Graph.6c. This implies that the trend in consumption always delayed compared with the corresponding trend in overexploited stock.

Moreover, Graph.6d indicates that sustainable stock having a strong negative correlation with seafood consumption, which further proved our observation in Graph.6b and 6c.

Next, we plotted graphs to check the stock level based on regions.

As we can see from Graph.6f,the world’s sustainable stock declined 1.2% between 2015-2017. The main factor contributed to this drop is the decreased sustainable stock in Northwest Pacific and Northwest Central Atlantic which were 21% and 24% respectively.

In comparison, Southeast Pacific, Southwest Pacific and Southwest Atlantic experienced the largest increase in sustainable stock.

It is worth noting that the stock level of Southeast Central Atlantic,Eastern Central Pacific and Eastern Central Atlantic were almost unchanged.

4.4.3 Summary and interesting findings:

1 The level of overexploited stock increasing with fluctuation overtime.

2 There are similar pattern between the trend of overexploited stock and world seafood consumption since 1989.

3 Compared with stock level, there are lags shown in the world consumption for each similar trend.

4 Some regions indicate very stable stock level between 2015-17.

5 Conclusion

The seafood consumption and production keep increase with the time change. Moreover, it is noteable that a strongly positive linear relationship between fish consumption and production. China produces the most seafood in worldwide and Asia is the main continent producing seafood. Pelagic fish has the highest production scale and most of them comes from Americas.

Compared with 1960s, developing countries from South-east Asia take more seats in top 10 seafood producers in 2010s. Also, farmed fishing is the most important factor to the improvement overall productivity. Moreover, we observe that overexploited stock has strong positive correlation with world fish consumption.

6 Limitation and recommendation

1 With missing value and incomplete data in ‘production’ and ‘captured vs farmed’, this study may lack accuracy for visualization.

2 Only 2 years data is available for most entities in ‘stock’ data set, which make it difficult to analyzing long term performance.

3 There is not a clear definition about those ‘Entity’ in ‘stock’ data. In this case, we are unable to link those entities with countries to have a deeper analysis.

4 Future research would benefit from the use of more datasets to broaden the prospective of analysis. For example, a data of stock level based on each country’s performance may help us to dig more insights in terms of sustainable fishing.

7 References

Becker OScbRA, Minka ARWRvbRBEbTP, Deckmyn. A (2021). maps: Draw Geographical Maps. R package version 3.4.0, https://CRAN.R-project.org/package=maps.

C. Sievert. Interactive Web-Based Data Visualization with R, plotly, and shiny. Chapman and Hall/CRC Florida, 2020.

Dumas, M. W. (1992). Productivity trends: prepared fish and seafoods industry. Monthly Lab. Rev., 115, 3.

Hughes E (2022). tidytuesdayR: Access the Weekly ‘TidyTuesday’ Project Dataset. R package version 1.0.2, https://CRAN.R-project.org/package=tidytuesdayR.

H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.

Ooms J (2022). gifski: Highest Quality GIF Encoder. R package version 1.6.6-1, https://CRAN.R-project.org/package=gifski.

Pedersen T, Robinson D (2020). gganimate: A Grammar of Animated Graphics. R package version 1.0.7, https://CRAN.R-project.org/package=gganimate.

Pedersen T (2020). patchwork: The Composer of Plots. R package version 1.1.1, https://CRAN.R-project.org/package=patchwork.

Pedersen T (2020). transformr: Polygon and Path Transformations. R package version 0.1.3, https://CRAN.R-project.org/package=transformr.

Simon Garnier, Noam Ross, Robert Rudis, Antônio P. Camargo, Marco Sciaini, and Cédric Scherer (2021). Rvision - Colorblind-Friendly Color Maps for R. R package version 0.6.2.

Simon Garnier, Noam Ross, Robert Rudis, Antônio P. Camargo, Marco Sciaini, and Cédric Scherer (2021). Rvision - Colorblind-Friendly Color Maps for R.

Subasinghe, R., Soto, D., & Jia, J. (2009). Global aquaculture and its role in sustainable development. Reviews in aquaculture, 1(1), 2-9.

Tierney N (2017). “visdat: Visualising Whole Data Frames.” JOSS, 2(16), 355. doi:10.21105/joss.00355 https://doi.org/10.21105/joss.00355, http://dx.doi.org/10.21105/joss.00355.

Urbanek S (2013). png: Read and write PNG images. R package version 0.1-7, https://CRAN.R-project.org/package=png.

Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R, Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H (2019). “Welcome to the tidyverse.” Journal of Open Source Software, 4(43), 1686. doi:10.21105/joss.01686 https://doi.org/10.21105/joss.01686.

Wickham H, Hester J, Bryan J (2022). readr: Read Rectangular Text Data. R package version 2.1.2, https://CRAN.R-project.org/package=readr.

Wilke C (2021). ggridges: Ridgeline Plots in ‘ggplot2’. R package version 0.5.3, https://CRAN.R-project.org/package=ggridges.

Yihui Xie (2022). bookdown: Authoring Books and Technical Documents with R Markdown. R package version 0.26.

Zhu H (2021). kableExtra: Construct Complex Table with ‘kable’ and Pipe Syntax. R package version 1.3.4, https://CRAN.R-project.org/package=kableExtra.